Multiview Representation Learning via Deep CCA for Silent Speech Recognition
نویسندگان
چکیده
Silent speech recognition (SSR) converts non-audio information such as articulatory (tongue and lip) movements to text. Articulatory movements generally have less information than acoustic features for speech recognition, and therefore, the performance of SSR may be limited. Multiview representation learning, which can learn better representations by analyzing multiple information sources simultaneously, has been recently successfully used in speech processing and acoustic speech recognition. However, it has rarely been used in SSR. In this paper, we investigate SSR based on multiview representation learning via canonical correlation analysis (CCA). When both acoustic and articulatory data are available during training, it is possible to effectively learn a representation of articulatory movements from the multiview data with CCA. To further represent the complex structure of the multiview data, we apply deep CCA, where the functional form of the feature mapping is a deep neural network. This approach was evaluated in a speaker-independent SSR task using a data set collected from seven English speakers using an electromagnetic articulograph (EMA). Experimental results showed the effectiveness of the multiview representation learning via deep CCA over the CCAbased multiview approach as well as baseline articulatory movement data on Gaussian mixture model and deep neural networkbased SSR systems.
منابع مشابه
Deep Generalized Canonical Correlation Analysis
We present Deep Generalized Canonical Correlation Analysis (DGCCA) – a method for learning nonlinear transformations of arbitrarily many views of data, such that the resulting transformations are maximally informative of each other. While methods for nonlinear two-view representation learning (Deep CCA, (Andrew et al., 2013)) and linear many-view representation learning (Generalized CCA (Horst,...
متن کاملDeep Multilingual Correlation for Improved Word Embeddings
Word embeddings have been found useful for many NLP tasks, including part-of-speech tagging, named entity recognition, and parsing. Adding multilingual context when learning embeddings can improve their quality, for example via canonical correlation analysis (CCA) on embeddings from two languages. In this paper, we extend this idea to learn deep non-linear transformations of word embeddings of ...
متن کاملDeep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning
Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...
متن کاملArticulatory information and Multiview Features for Large Vocabulary Continuous Speech Recognition
This paper explores the use of multi-view features and their discriminative transforms in a convolutional deep neural network (CNN) architecture for a continuous large vocabulary speech recognition task. Mel-filterbank energies and perceptually motivated forced damped oscillator coefficient (DOC) features are used after feature-space maximum-likelihood linear regression (fMLLR) transforms, whic...
متن کاملNew Techniques in Deep Representation Learning
New Techniques in Deep Representation Learning Galen Andrew Chair of the Supervisory Committee: Associate Professor Emanuel Todorov CSE, joint with AMATH The choice of feature representation can have a large impact on the success of a machine learning algorithm at solving a given problem. Although human engineers employing taskspecific domain knowledge still play a key role in feature engineeri...
متن کامل